\(~\)
\(~\)
This session, we will introduce how to visualize our data. There are two major sets of tools for creating plots in R:
tidyverse. For instance,\(~\)
We will be focusing on ggplot2 in our class. Because:
\(~\)
Policy advocacy should rely heavily on data. Sometimes drawing a figure (a.k.a. visualization) should be a critical step and can be even more precise than conventional statistical computations. A figure is powerful in itself.
\(~\)
For the following examples, we will be using the gapminder dataset. Gapminder is a country-year dataset with information on life expectancy, among other things.
\(~\)
If you have not already installed the gapminderpackage
and you try to load it using the following code, you will get an
error:
\(~\)
library(gapminder)
Error in library(gapminder) : there is no package called ‘gapminder’
\(~\)
If this happens, install the gapminder package by
running install.packages("gapminder") in your console.
\(~\)
Once you’ve done this, run the following code to load the
gapminder dataset, the tidyverse library,
which includes ggplot2:
\(~\)
library(tidyverse)
library(gapminder)
## Warning: package 'gapminder' was built under R version 4.0.2
gap <- gapminder
head(gap)
## # A tibble: 6 × 6
## country continent year lifeExp pop gdpPercap
## <fct> <fct> <int> <dbl> <int> <dbl>
## 1 Afghanistan Asia 1952 28.8 8425333 779.
## 2 Afghanistan Asia 1957 30.3 9240934 821.
## 3 Afghanistan Asia 1962 32.0 10267083 853.
## 4 Afghanistan Asia 1967 34.0 11537966 836.
## 5 Afghanistan Asia 1972 36.1 13079460 740.
## 6 Afghanistan Asia 1977 38.4 14880372 786.
\(~\)
\(~\)
Once you load the date, based on what we’ve learned in previous classes, discuss the following questions within your group.
\(~\)
(Hint: You can also run ?gapminder in the console to
open the help file for the data and definitions for each of the
columns.)
\(~\)
\(~\)
The general call for ggplot2 looks like this:
\(~\)
ggplot(data =, aes(x = , y = )) +
geom_xxxx() +
geom_yyyy()
\(~\)
The grammar involves some basic components:
\(~\)
The key to understanding ggplot2 is thinking about a
figure in layers: just like you might do in an image
editing program like Photoshop
\(~\)
ggplot(data = gap, aes(x = gdpPercap, y = lifeExp)) +
geom_point()
\(~\)
So the first thing we do is call the ggplot function.
This function lets R know that we’re creating a new plot, and any of the
arguments we give the ggplot function are the global
options for the plot: they apply to all layers on the plot.
\(~\)
For the second argument we passed in the aes function,
which tells ggplot how variables in the data map to
aesthetic properties of the figure, in this case the x and y locations.
Here we told ggplot we want to plot the
lifeExp column of the gapminder data frame on the x-axis,
and the gdpPercap column on the y-axis.
\(~\)
Notice that we didn’t need to explicitly pass aes these columns (e.g., x = gapminder$lifeExp), this is because ggplot is smart enough to know to look in the data for that column!
\(~\)
Then, we need to tell ggplot how we want to visually represent the
data, which we do by adding a new geom layer. In our
example, we used geom_point, which tells ggplot we want to
visually represent the relationship between x and y as a scatterplot of
points:
\(~\)
IMPORTANT: In ggplot, you are adding
layers, so you should use + to separate each line of
code!
IMPORTANT: In ggplot, you are adding
layers, so you should use + to separate each line of
code!
IMPORTANT: In ggplot, you are adding
layers, so you should use + to separate each line of
code!
\(~\)
ggplot(data = gap, aes(x = gdpPercap, y = lifeExp)) +
geom_point()
\(~\)
\(~\)
\(~\)
aes\(~\)
In the previous examples and challenge we’ve used the
aes function to tell the scatterplot geom
about the x and y locations of each
point. Another aesthetic property we can modify is the point
color.
ggplot(data = gap, aes(x = gdpPercap, y = lifeExp, color = continent)) +
geom_point()
Then, we can add a line of code to set your color manually. You can also google the R color palette for detail color code.
ggplot(data = gap, aes(x = gdpPercap, y = lifeExp, color = continent)) +
geom_point() +
scale_color_manual(values = c("gold", "lightblue", "red", "lightgreen", "pink"))
Furthermore, you can modify the opacity of points by
alpha in your geom_point setting.
alpha is in a range from 0 to 1.
ggplot(data = gap, aes(x = gdpPercap, y = lifeExp, color = continent)) +
geom_point(alpha = 0.5)
Color isn’t the only aesthetic argument we can set to display variation in the data. We can also vary by shape, size, etc. For example, we can also set the shape by continent too.
ggplot(data = gap, aes(x = gdpPercap, y = lifeExp, color = continent, shape = continent)) +
geom_point(alpha = 0.5)
\(~\)
\(~\)
In the previous challenge, you plotted lifExp over time.
Using a scatterplot probably isn’t the best for visualizing change over
time. Instead, let’s tell ggplot to visualise the data as a
line plot:
ggplot(data = gap, aes(x = year, y = lifeExp, by = country, color = continent)) +
geom_line()
Instead of adding a geom_point layer, we’ve added a
geom_line layer. We’ve also added the by aesthetic,
which tells ggplot to draw a line for each country.
\(~\)
But what if we want to visualize both lines and points on the plot? We can simply add another layer to the plot:
ggplot(data = gap, aes(x = year, y = lifeExp, by = country, color = continent)) +
geom_line() +
geom_point()
It’s important to note that each layer is drawn on top of the previous layer. In this example, the points have been drawn on top of the lines. Here’s another demonstration:
ggplot(data = gap, aes(x = year, y = lifeExp, by = country)) +
geom_line(aes(color = continent)) +
geom_point()
In this example, the aesthetic mapping of color has
been moved from the global plot options in ggplot to the
geom_line layer so it no longer applies to the points. Now
we can clearly see that the points are drawn on top of the lines.
\(~\)
\(~\)
\(~\)
\(~\)
Labels are considered to be their own layers in ggplot.
You can use labs(x = , y = , title = ) to set your
labels.
# add x and y axis labels
ggplot(data = gap, aes(x = gdpPercap, y = lifeExp, color=continent)) +
geom_point(alpha = 0.5) +
labs(x = "GDP per capita (in US$)", y = "Life Expectancy (in years)",
title = "Relations of Life Expectancy and Ecomonic Development, by Continent")
You can also modify the theme of your plots. The themes in ggplot
include theme_bw(), theme_classic(),
theme_light(), theme_void(), etc. I recommend
theme_few() in ggthemes package.
# add x and y axis labels
ggplot(data = gap, aes(x = gdpPercap, y = lifeExp, color = continent)) +
geom_point(alpha = 0.5) +
labs(x = "GDP per capita (in US$)", y = "Life Expectancy (in years)",
title = "Relations of Life Expectancy and Ecomonic Development, by Continent") +
theme_few()
\(~\)
\(~\)
\(~\)
\(~\)
In ggplot, we can change the scale of units on the
x-axis using the scale functions. These control the mapping between the
data values and visual values of an aesthetic.
ggplot(data = gap, aes(x = gdpPercap, y = lifeExp, color = continent)) +
geom_point(alpha = 0.5) +
scale_x_log10() + # this sets the value in x asix in its log10
labs(x = "Logged GDP per capita (in US$)", y = "Life Expectancy (in years)",
title = "Relations of Life Expectancy and Ecomonic Development, by Continent") +
theme_few()
We can also manually do that in the global aesthetic setting. For example,
# Here I take the natural log transformation on GDP per capita
ggplot(data = gap, aes(x = log(gdpPercap), y = lifeExp, color = continent)) +
geom_point(alpha = 0.5) +
labs(x = "Logged GDP per capita (in US$)", y = "Life Expectancy (in years)",
title = "Relations of Life Expectancy and Ecomonic Development, by Continent") +
theme_few()
Lastly, You can use data wrangling functions that we just learned to choose the data we want. For example, if we only take care of the data on all African countries in 2007.
# filter by all African countries in 2007
gap %>%
filter(continent == "Africa" & year == 2007) %>%
ggplot(aes(x = log(gdpPercap), y = lifeExp)) +
geom_point(alpha = 0.5) +
geom_smooth(method = "lm") + # we can even add a regression line
labs(x = "Logged GDP per capita (in US$)", y = "Life Expectancy (in years)",
title = "Relations of Life Expectancy and Ecomonic Development in Africa Since 2007") +
theme_few()
Pay attention here, when we use dpylr and pipes, we have
%>% to separate lines; however, in ggplot, we have
+ instead!
You can also make shape on the point, for example.
# we want to highlight Rwanda
gap %>%
filter(continent == "Africa" & year == 2007) %>%
mutate(rwanda = ifelse(country == "Rwanda", "Rwanda", "Others")) %>%
ggplot(aes(x = log(gdpPercap), y = lifeExp)) +
geom_point(aes(shape = rwanda, color = rwanda), alpha = 0.5) +
geom_smooth(method = "lm", show.legend = FALSE) +
scale_color_manual(values = c("black", "red")) +
scale_shape_manual(values = c(16, 8)) +
labs(x = "Logged GDP per capita (in US$)", y = "Life Expectancy (in years)",
title = "Relations of Life Expectancy and Ecomonic Development in Africa") +
theme_few()
\(~\)
## Warning: Using `size` aesthetic for lines was deprecated in ggplot2 3.4.0.
## Warning: Please use `linewidth` instead.
Previously, we visualized the change in life expectancy over time across all countries in one plot. Alternatively, we can split this out over multiple panels by adding a layer of facet panels.
\(~\)
facet_wrap() is a useful tool to display patterns for
different groups. For example:
ggplot(data = gap, aes(x = log(gdpPercap), y = lifeExp, color = continent)) +
geom_point(alpha = 0.5) +
facet_wrap(~ continent) +
labs(x = "Logged GDP per capita (in US$)", y = "Life Expectancy (in years)",
title = "Relations of Life Expectancy and Ecomonic Development, by Continent") +
theme_few()
\(~\)
If we would like to compare five continents in the same line, we can
use ncol = or nrow to set how many facets we’d
like to present in each column or row.
\(~\)
ggplot(data = gap, aes(x = log(gdpPercap), y = lifeExp, color = continent)) +
geom_point(alpha = 0.5) +
facet_wrap(~ continent, ncol = 5) +
labs(x = "Logged GDP per capita (in US$)", y = "Life Expectancy (in years)",
title = "Relations of Life Expectancy and Ecomonic Development, by Continent") +
theme_few()
Let’s go back to the Rwanda example. Let’s facet_wrap by year.
gap %>%
filter(continent == "Africa") %>%
mutate(rwanda = ifelse(country == "Rwanda", "Rwanda", "Others")) %>%
ggplot(aes(x = log(gdpPercap), y = lifeExp)) +
geom_point(aes(shape = rwanda, color = rwanda), alpha = 0.5) +
geom_smooth(method = "lm") +
facet_wrap(~year) +
scale_color_manual(values = c("black", "red")) +
scale_shape_manual(values = c(16, 8)) +
labs(x = "Logged GDP per capita (in US$)", y = "Life Expectancy (in years)",
title = "Relations of Life Expectancy and Ecomonic Development in Africa") +
theme_few()
\(~\)
\(~\)
Legends are more complicated than axes. Because:
\(~\)
\(~\)
By default, a layer will only appear if the corresponding aesthetic
is mapped to a variable with aes(). You can override
whether or not a layer appears in the legend with
show.legend = FALSE to prevent a layer from ever appearing
in the legend; TRUE forces it to appear when it otherwise
wouldn’t.
\(~\)
gap %>%
filter(continent == "Africa") %>%
mutate(rwanda = ifelse(country == "Rwanda", "Rwanda", "Others")) %>%
ggplot(aes(x = log(gdpPercap), y = lifeExp)) +
geom_point(aes(shape = rwanda, color = rwanda), alpha = 0.5, show.legend = FALSE) + # HERE!
geom_smooth(method = "lm") +
facet_wrap(~year) +
scale_color_manual(values = c("black", "red")) +
scale_shape_manual(values = c(16, 8)) +
labs(x = "Logged GDP per capita (in US$)", y = "Life Expectancy (in years)",
title = "Relations of Life Expectancy and Ecomonic Development in Africa") +
theme_few()
## `geom_smooth()` using formula = 'y ~ x'
\(~\)
You can also change the location of legend with theme()
function. The position and justification of legends are controlled by
the theme setting legend.position, which takes values
“right”, “left”, “top”, “bottom”, or “none” (no legend).
\(~\)
gap %>%
filter(continent == "Africa") %>%
mutate(rwanda = ifelse(country == "Rwanda", "Rwanda", "Others")) %>%
ggplot(aes(x = log(gdpPercap), y = lifeExp)) +
geom_point(aes(shape = rwanda, color = rwanda), alpha = 0.5) +
geom_smooth(method = "lm") +
facet_wrap(~year) +
scale_color_manual(values = c("black", "red")) +
scale_shape_manual(values = c(16, 8)) +
labs(x = "Logged GDP per capita (in US$)", y = "Life Expectancy (in years)",
title = "Relations of Life Expectancy and Ecomonic Development in Africa",
color = "", shape = "") +
theme_few() +
theme(legend.position = "bottom") # position
## `geom_smooth()` using formula = 'y ~ x'